186 research outputs found
Scalable Estimation of Dirichlet Process Mixture Models on Distributed Data
We consider the estimation of Dirichlet Process Mixture Models (DPMMs) in
distributed environments, where data are distributed across multiple computing
nodes. A key advantage of Bayesian nonparametric models such as DPMMs is that
they allow new components to be introduced on the fly as needed. This, however,
posts an important challenge to distributed estimation -- how to handle new
components efficiently and consistently. To tackle this problem, we propose a
new estimation method, which allows new components to be created locally in
individual computing nodes. Components corresponding to the same cluster will
be identified and merged via a probabilistic consolidation scheme. In this way,
we can maintain the consistency of estimation with very low communication cost.
Experiments on large real-world data sets show that the proposed method can
achieve high scalability in distributed and asynchronous environments without
compromising the mixing performance.Comment: This paper is published on IJCAI 2017.
https://www.ijcai.org/proceedings/2017/64
Integrating Specialized Classifiers Based on Continuous Time Markov Chain
Specialized classifiers, namely those dedicated to a subset of classes, are
often adopted in real-world recognition systems. However, integrating such
classifiers is nontrivial. Existing methods, e.g. weighted average, usually
implicitly assume that all constituents of an ensemble cover the same set of
classes. Such methods can produce misleading predictions when used to combine
specialized classifiers. This work explores a novel approach. Instead of
combining predictions from individual classifiers directly, it first decomposes
the predictions into sets of pairwise preferences, treating them as transition
channels between classes, and thereon constructs a continuous-time Markov
chain, and use the equilibrium distribution of this chain as the final
prediction. This way allows us to form a coherent picture over all specialized
predictions. On large public datasets, the proposed method obtains considerable
improvement compared to mainstream ensemble methods, especially when the
classifier coverage is highly unbalanced.Comment: Published at IJCAI-17, typo fixe
- …